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ABSTRACT 

Methodological approaches to protamine PI sequence determination have evolved from the initial protein sequencing 
methods to the robust cloning and PCR-based techniques. Twenty-seven different mammalian-avian P] type protamine 
genes and 32 different PI amino-acid sequences are now' available and allow detailed phylogenetic analysis and the study of 
transcriptional control mechanisms. All mammalian-avian PI type protamines contain a well conserved N-terminus with 
the consensus *‘ARYR'’ followed by alternating “S/T-S-S" phosphorylatable residues. Eutherian mammalian Pis contain 
cysteine residues, whereas birds, pmtotherian and metatherian protamines lack cysteine. Thus cysteine appeared after the 
divergence of marsupials, monotremes and placental lineages, Overall detailed phylogenetic analysis of the gene 
sequences indicates that the evolution of PI genes is in agreement with the expected species evolution supporting that 
these genes have evolved vertically. 


RESUME 

Sequence, evolution et regulation transcriptionnelle des protamines de type Pi des Oiseaux et 
Mam mi feres 

Les approches methodologiqucs dc determination de sequence des protamines PI ont evolue depths les premieres 
methodes de s^quentjage de proteines jusqu’aux techniques Babies basics sur le elonage et la reaction d'amplification en 
charne. Vingt-sept genes differenis de protamines dc type PI des Oiseaux et Mam mi feres et trente-deux sequences 
differentes decides amines sont maintenant disponibles et permcttenl unc analyse phylogcnique el unc etude des 
mecanisme de comrole de la transcription. Tomes les prolamines dc type PI des Oiseaux ct Mammifcrcs contiennent une 
extremity N-lerminale bien cortservee avec ia sequence consensus "ARYR” suivie par la sequence akernee de residus 
phosphorylabtes ll S/T-S.-S M . Les protamines PI des Mammiferes Emheriens eontiennem des residue de cysteine, alors que 
les Oiseaux, les Protheriens et les M6thath£riens rfen onl pas. La cysteine est done apparue apres la divergence des lignees 
des Marsupiaux, des Monotremes et des Placemaires, Une analyse phylogcnique generate et detaillee des sequences de 
gfcnes indique que Involution des genes des PI est en accord avec revolution attendue des espeees, ce qui indique que ces 
gfenes ont evolue verticalemem. 


Protamines are small (30-60 amino acids) and very positively charged proteins (40-70% 
arginine) which appear at the late stages of spermatogenesis in many but not all animal, and some 
plant, species [6, 11, 16, 21-23, 25, 43, 50, 52, 60, 69-73], In those species in which they 
occur, such as in all mammals [6, 52], birds (13, 52, 53,], some teleost fish [11, 16], some 
reptiles [25, 70] and amphibians [25] they replace most of the histones during spermiogenesis and 
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become the major sperm nuclear protein [47, 48, 52, 54], There are two basic groups of 
protamines in mammals; the PI protamines which have been found in all mammalian species that 
have been analyzed and the P2 protamines which have been found in humans [5, 8, 42, 79, 80] 
and a limited number of other mammals such as mouse, guinea pig and stallion [10, 59, 66], 
However, pro-P2 protamine genes have been sequenced front eight species of primates [66], 
Both types of protamines contain cysteine which can form disulphide bonds and contribute to the 
stability of the condensed sperm nucleus. Bird protamines lack cysteine [13,45, 49, 52] although 
they are clearly related to mammalian PI protamines as several identical amino acid sequence 
motifs are present in both cases. Because of the high variability of protamines and protamine 
genes it is very difficult at present to explain the evolution of these proteins within an entire 
phylum. In many cases the limited number of sequences available precludes their connection into 
a coherent evolutive pathway. Thus the focus in this review has been placed in the avian 
mammalian-mammalian PI type protamine for which a considerable amount of information is 
now available. Other papers and reviews cover other vertebrate or invertebrate groups ([4, 12, 13, 
16, 25, 43, 49, 52. 60, 70, 71, 73], see also CHIVA, SAPERAS, CACERES & AUSiO, this 
volume, and PRATS & CORNUDELLA, this volume). Protamine genes are a clear example of 
highly tissue-specific genes. However the mechanisms that direct their specific expression in the 
testis are not fully understood [18, 22, 50, 52, 80]. Thus the last section of this review covers the 
progress made in the understanding of the transcriptional control of the PI genes. 

RESULTS AND DISCUSSION 

Methodological approaches to protamine PI sequence determination: 

The methods initially available to sequence protamines were based on end group analysis, 
proteolytic digestion, isolation, sequencing, and overlapping of the protamine peptides. The 
presence of several arginine tracts in each protamine with very similar sequences made this 
approach technically difficult. The first reported avian-mammalian PI type complete sequences 
using these methods corresponded to bull [15; Table 1], Gallus domesticus [45] and boar [76]. 
Some discrepancies in the initial reported sequences were found when the corresponding 
protamines were re-examined by automated micro-sequencing or by cloning of the protamine 
cDNAs and genes [37,49, 40], Subsequently the use of automated protein micro-sequencing led 
to the determination of the sequences of human PI [41 ]. stallion PI [2, 9], ram [71] and rabbit, 
goat and rat [3]. Partial sequences corresponding to a few N-terminal residues have also been 
reported for many mammalian protamines [6] (Table 1). 

Simultaneously to the onset of the use of automated micro-sequencing, the methods of the 
cDNA synthesis, donning and sequencing were also developed and applied to protamine genes 
(Table I). The first mammalian protamine cDNA sequence corresponded to mouse PI [29]. Since 
no probes were initially available to screen the cDNA library, this initial sequence was obtained by 
characterization of selected clones preferentially expressed in spermatids [28]. Subsequently, the 
use of the mouse PI cDNA clone as a probe led to the determination of the sequence of bovine 
protamine PI cDNA [35]. Simultaneously, the bovine protamine PI cDNA sequence was also 
independently obtained using oligonucleotides designed from the previously known amino acid 
sequence [31]. The mouse PI cDNA also led to the isolation of the boar protamine 1 cDNA [37] 
and rat PI cDNA [30]. The bovine probe led to the isolation and sequencing of the human 
protamine PI cDNA [36]. However the mammalian probes would not recognize the avian 
protamine genes because of marked divergence in the nucleotide sequences between these species. 
Thus the cDNA sequence corresponding to rooster protamine was obtained by random 
sequencing of 210 clones from a rooster testis cDNA library until the sequence of one clone 
predicted an amino acid sequence similar to the previously reported at the protein level lor galline 
[55]. The availability of a cDNA probe from galline led to the rapid isolation and sequencing of 
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]. — Chronological list of repons on protamine PI sequences with indication of the species and methods used for 
sequencing. 
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Protamine sequence/s reported 

Method 

Reference 

Bull (Bos taunts) 

Protein sequencing 

[15] 

Gal tine ( Callus domestic us) 

Protein sequencing 

[45] 

Rat -paitiaH/tarruj norvegtcus) 

Protein sequencing 

127] 

Boar (Sus scrofa) 

Protein sequencing 

[76] 

Ram (Oils (tries) 

Protein sequencing 

]71] 

Human ( Homo sapiens) 

Protein sequencing 

[41] 

Mouse [Mus muscuhts ) 

cDNA 

[29] 

Bull {Bos taurus) (corrected) 

Protein sequencing 

[31] 

Bull {Bos taunts) 

cDNA 

[31] 

Bull (Bor mums) 

cDNA 

[35] 

Stallion ( Equus cabaUus) 

Protein sequencing 

m 

Stallion (Equus ciiballus) 

Protein sequencing 

[2! 

Human {Homo sapiens) 

cDNA 

136] 

Gal line (Galt us domes ficus) 

cDNA 

[55] 

Bull (Bos taunts) 

Genomic phage library 

[32] 

Mouse (Mus muscuhts) 

Genomic phage library 

[24] 

Mouse iMus muscuhts) 

Protein sequencing 

[10] 

Rabbit (Oryctolagus cunkulus) 

Protein sequencing 

[3] 

Goal (Cap rtt hi reus) 

Protein sequencing 

[3] 

Rai (Rattus nonegicus) 

Protein sequencing 

[3] 

Boar (Sus scrofa) 

cDNA 

[37] 

CKicken (Callus domesticus) 

Genomic cosmid library 

[49] 

Quail (CoiumiJcjaponica) 

cDNA 

[53] 

Rai (Bonus norvegtats) 

cDNA 

[30] 

Wallaby {Cricefulus migraiorius) 

Protein sequencing 

[7] 

Dwarf Hamster (Phodopus sun gar us) 

Protein sequencing 

[6] 

Riiesus monkey -partial -(Macaw mulatto) 

Protein sequencing 

[6) 

Human £ Homo sapiens) 

Genomic cosmid library 

[18] 

Marmoset (Sag a inns impe tutor) 

PCR with genomic DNA 

[63] 

Boar (Sus scrofa) 

Genomic phage library 

[26] 

Whale (Ore in us area) 

PCR 

[1] 

Human (Mediterranean, Sudanese, Korean, American Indian) 

PCR 

[621 

Rat -5'region-!/? norvegieus ) 

PCR 

[64] 

Guinea pig -5'regtoMCtfntf parcellus} 

PCR 

[64] 

Gorilla -5 region -{Gorilla gorilla) 

PCR 

[64] 

Orangutan -5 'region -(Ponga pygmaeus) 

PCR 

[64] 

Anubis baboon -5Yegio n-(Papk) dog item) 

PCR 

[64] 

Red monkey-5 region- (Cercopi thecas pa las) 

PCR 

[64] 

Opossum (Didelphis marsitpialis) 

PCR 

[78] 

Common chimpanzee [Pan troglodites) 

PCR 

[68] 

Pygmy chimpanzee (Pan pamscus) 

PCR 

[68] 

Gorilla (Gorilla gorilla) 

PCR 

[681 

0 ran g u t an i Pan go pygma eus) 

PCR 

[68] 

Gibbon (Hylabates hit) 

PCR 

[68] 

Red monkey (Cercopithecas patas) 

PCR 

[68] 

Marmoset (Sagainns imperator) 

PCR 

[68] 

Red howler (Alouatta seniculus) 

PCR 

[68] 

Platypus (Omithonncus marinas) 

PCR 

[67] 

Echidna (Tachyglossus aculeams) 

PCR 

[67] 

Rat (Rattus nonegicus) 

PCR 

[61] 

Guinea pig (Cavia porcelhts) 

PCR 

[61] 

Cat (Felia cams) 

PCR 

[61] 

Bear {Ursus americanus) 

PCR 

[61] 

Elephant (Elephas) 

PCR 

[61] 

Horse (Equus cabaUus) 

PCR 

[61] 

Camel (Cametus dromedurius) 

PCR 

[61] 

Dec r ( Odoco idea s \>i rgin ian us ) 

PCR 

[61] 

Elk (Cerxus elaphus) 

PCR 

[61] 

Moose (A Ices a ices) 

PCR 

[61] 

Gazelle {Gazelhi dorca) 

PCR 

[61] 
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another avian protamine, the quaii (Cotumix japonica) [53], as well as to the isolation and 
sequencing of the chicken genomic clones [49]. The same approach using cDNA as probes to 
screen genomic libraries was also followed to obtain the genomic sequences corresponding to bull 
PI [32], mouse PI [24], human PI [18] and boar PI [26] (Table 1). The availability of probes 
for all these genes also led to the determination of their copy number which proved to be one copy 
per haploid genome for PI protamines in mammals, two identical genes (coding region) per 
haploid genome in Callus domesticus, and one copy of P2 per haploid genome in mammals. This 
indicated that the numerous basic protamine type molecules present in the sperm nucleus [PI, P2, 
P3, P4 and othersj of mammals corresponded only to two types of protamines [PI and P2], The 
PI genes are located adjacent to other spermatid-specific genes in the mammalian genome [19, 34, 
46], As some mammalian protamine genes were sequenced by independent laboratories, some 
minor discrepancies in the reported sequences also emerged, such as in the bull genes [31, 35] 
and between the human PI sequence initially reported [18] with that subsequently redetermined in 
several independent individuals [62], 

The availability of the PI genomic sequences from human, bull, mouse and boar allowed 
their comparison in a search for conserved flanking sequences from which to design consensus 
oligonucleotides. This approach proved to be extremely efficient leading to the isolation and 
sequencing of the Saguinus imperator PI prolamine [63], Orcinus area PI [1], and subsequently 
several primates (common chimpanzee, pygmy chimpanzee, gorilla, orangutan, gibbon, 
Cercopithecus patas, Alouatta seniculus ) [68], several human individuals (Mediterranean, 
Korean, Sudanese, American Indian) [62], addilional eutherian mammals (rat, guinea pig, cat 
bear, elephant, horse, camel, deer, elk moose, gazelle) [61] and the monotremes, platypus and 
echidna [67], The determination of a Wallaby partial amino acid sequence [7] by protein micro¬ 
sequencing led to the sequencing of the opossum protamine PI [77, 78], A si mil ar PCR-based 
approach was followed to amplify and sequence the promoter region of the rat, guinea pig, 
gorilla, orangutan, anubis baboon and red monkey [64], 

Although PCR from genomic DNA is a very valuable tool in the amplification and 
sequencing of new protamine genes it does not provide information on whether the sequenced 
genes are expressed or not (for instance, if a sequenced gene is a pseudogene). In the case of PI 
genes, the fact that all mammalian species where the sperm nuclear protein content has been 
analyzed contain protamine PI together with the single copy number of the Pi genes in mammals, 
suggests that most (if not all) of the mammalian species whose PI sequence has been determined 
by PCR also express the sequenced gene. However this could be a limitation in the prediction of 
functional properties based on the derived amino acid sequence in the case of those proteins (such 
as P2 protamine) which are not ubiquitously expressed in mammals [66]. 

The availability, at present, of a large number of PI sequences should allow design of new 
primers with which to amplify and sequence the PI genes corresponding to species which have 
remained so far elusive. In the case of the PI genes the PCR approach has been successful in the 
amplification of sequences corresponding to members of the class from which the oligos were 
predicted (e.g. mammals), but failed in the amplification of the PI genes corresponding to other 
classes (e.g. reptiles or amphibians). Thus determination of the sequences of protamine genes in 
other vertebrate classes (or in other phyla) will probably require laborious groundwork based on 
either protein micro-sequencing (followed by oligonucleotide design and PCR) or cDNA based 
approaches [29, 55], However once one or a few nucleotide sequences became available in the 
additional phyla and classes [4, see PRATS & CORNUDELLA, this volume], the same PCR-based 
approach successfully used in mammalian-avian Pis should also work in the determination of 
protamine sequences corresponding to most of the members of other classes. 
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ARYRCCLTH- - SG SRCERRRRRRCRRRRRR- FG-- - RRRRRR-VCCRR*- - - YTVIRCTRQ 

ARYRCCLTH- -SRSRCRRRRRRRCRRRRPJl- FG--RRRRRR- --VCCRR-YTWRCTRQ 

ARYRCCLTH - - SRSRCRRRRRRRCRRRRRR-FG--RRRRRR- --VCCRR-YTWRCTRQ 

ARNRC -RSP - - SQSRCRRPRRR^CRR -RIR^CC - ——RRQ - RR VCCRR-YTTTRCARQ 

ARYRCCR SH~ - SRSRCRPRRRR- CRRRRRR- CC - - -PRR-RRA--VCCRR-YTVTRCRRC 

ARYRCCRSQ—- SQSRCRRRRRRRCRRRRRR- SV--R-Q-KR VSCRR YTVLRCRRRR 

ARYRCCRSK— SR5 RCRRRRR- RCRRRRRR - CC -R- RRRR--- RCCRRRRS YT - IRCKKY 

ARYRCCRSK- 'SRSRCRRRRR-RCRRRRRR-CC----H-RRRR— RCCRKRRSYT-FRCKRY 

VRYRCCRSQ— SRS RCRRRRR-RCRRRKRR - CC-QRRRVR- * -KCCRR--TYT-LRCRRY 

ARYRCCRSQ--SRSRCYRQRR-^SPRRKRQ-SC-— - --QTQRRAM’-RCCRRR--SR-LRRRRH 

ARYRCCRSQ—SRSRCYRQRQ -RSRRRKRQ - SC-- -QTQRRAM- -RCCRHR- -SR-MRRRRH 

ARYRCCRSQ—SRSRCYRQRQ- TSRRRKRR- SC - -QTQRRAM- -RCCRRR- -NR-LRRRKH 

ARYRCCRSQ—SRSRYYRQRQ-RSRRRRRR- SC-QTRRRAM- -RCCRPR- -YR-PRCRRH 

ARYRCCRSQ - - SQSRCCRRRQ-RCHRKRRR-CC-QTRRRAM—RCCRRR—YR-LRCRRH 

ARYRCCRSQ—SRSRCCRQRR-RCRRRRRR- RC-RARRRAM- -KCCRRR- -YR-LRCRRY 

ARYRCCRSQ—SRSRCYRRGQ- RSRRRRRR- SC --QTRRRAM- -RCCRPR- -YR-LRRKRH 

ARYRCCRSRSLSRSRCYRQRP- RCRRRRRR-SC— -RRP-RAS—RCCRRR- - YR-LRRRRY 

ARYRCCRSQ—SRSR©YRQKR“RGRRRRKR-TC-—KRR-RAS— RCCRRR- -YK-LTCRRY 

ARYRR- RSRSRSRSR YGRRRRR - SRS RRRR- SRRRRRRRG- - - RRG - - RGYHRRS PHRRRRRRRR 

ARFRPSRSR--SRSLYRRRRR—SRRQRSRRGGRQTGPRKITRRGRGRGKSRRRRGRRSMRSSRRRRRRRRM 

ARFRRSRSR- - SRSLYRRRRR- -SRR-GGRQTR SRKLSR- SRRRGRSRRRKGRRS RRS SRRS—RRRN 

ARYRRSRTR- - S RS PRSRRRRRRSGRRR SPRRRRRYGSARRSRROTGGRRRR-YGSFilRRPRRY 

ARYRRTRTR- - SRSR-RRRRSRRRR -- - S SRR- RRYGRSRR5YRSVGRRRRR- YGRRRRRRRRY 


Fig. 1. — Alignment of the reported avian-mammalian PI amino add sequences {see Table I). 


Implications for protamine PI gene evolution 

The coincidence of the four C-termmal amino acids between mammalian Pis and galiine 
(the protamine from Gall us domesticus) has been known for nearly two decades [45]. However, 
because of the lack of cysteine residues in bird protamines and the presence of these amino acids 
in mammalian protamines, both types of protamines had been classically classified as belonging to 
different types. According to Bloch’s classification, galiine was type 1 (or a true protamine) 
whereas mammalian protamines were type 2 (stable or keratinous protamines). The similarity 
between mammalian and rooster protamine became stronger when the sequence predicted from the 
genome (and in accordance with the re-sequence obtained for the N-terminus of the protein) [49]; 
(Fig, 1) was used instead of the initial sequence reported [45], For instance, the single threonine 
residue present in gaJline occupies exactly the same position as that present in ram and hull [52] 
(Fig. 1). The determination of the quail protamine sequence [53] also revealed the presence of the 
N-terminal ARYR motif and a size (56 residues) closer to mammalian Pis (50 residues) than 
those with galiine (61 residues; Fig, 1). An alternating triple phosphoryiatable site (T/S-S-S) is 
also found at positions 9-15 in all avian-mammalian PI protamines (Figs 1, 2) [6, 20, 52, 57, 
58]. The size among bird protamines appears consistently similar to that of mammals (Fig. 1) [13, 
14], Altogether the data strongly suggest the existence of an avian-mammalian protamine gene line 
during evolution (Fig. 2), 

A new insight has come form the recent determination of the sequence corresponding to the 
protothenan (the monotremes platypus and echidna) [67] and the metatherian protamines (the 
marsupials) [7, 69, 77, 78]. All these sequences lack cysteine and the corresponding genes 
contain one intron. Thus these species are closer to birds according to their lack of cysteine but 
closer to eutherian mammals according to the presence of the single intron. Detailed phylogenetic 
analysis indicates that these sequences are half way between eutherian mammals and birds [7, 49. 
67, 77, 78]. Based on the limited (but significant) similarity in the introns between prototherian 
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Fig. 2. —- Possible pathway oi' mammalian-avian prolamine PI gene evolution. 


and eutherian mammals it was concluded that the introns in the protamine PI genes of 
monotremes, marsupials and eutherian mammals were derived from a single iniron that was 
probably inserted into the ancestral gene prior to the divergence of the therm and prototheria 150 
to 170 million years ago [67] ( Fig. 2). 

The comparison of all protein and DNA sequences further strengthens the idea that 
protamines are amongst the most rapidly diverging proteins studied [68]. This variation may 
allow discrimination of closely related species or even individuals in some cases. For instance, a 
sequence polymorphism has been found in the human PI gene [62]. Molecular analysis of the PI 
genes from nine primates revealed that within primates the rate of evolutionary change is much 
higher than that within other mammalian orders [68]. Interestingly, the primate PI data confirm 
that human-gorilla-chimpanzee PI protamines are indeed very similar but, unlike the slightly 
closer association between chimpanzee and human derived from analysis of other genes [66], the 
human-gorilla relationship is slightly favoured in the case of the PI genes [68]. 

Overall phylogenetic analysis of all PI sequences (Table I) indicates that the molecular 
evolution of PI genes is in agreement with the expected species evolution supporting that these 
genes have evolved vertically [61, 83] (Fig. 2). 
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FiG, 3. — a: Alignments of all available mammalian Pi gene promoter sequences, An asterisk indicates that a position is 
conserved in ail species. The m positions are referenced relative to the lap [+1J The conserved SRE r “TGTGA0G \ 
Prof IC and the TATA box are boxed. The arrow over the ProrIC indicates the sequence which is palindromic with 
the l 'TGTGAGG + ‘ sequence (also arrowed) The start codon (ATG) is indicated by a downward arrow. After [64]. 
b: Position of the ProilC, SRE and ‘“TGTGAGG" sequences present in the PI genes and position of ProliC-likc 
sequences present in other testis-expressed genes. The putative tsp is indicated [ + !]. The numbered open boxes 
indicate the position ot sequences identical or similar to Prat 1C . The number in the open box indicates the number 
of matches to the I2mer ProiJC [thus* 12 indicates a perfect match]. The beginning of the coding region is shown 
by the unnumbered solid boxes 3’ to the tsp. A broken line indicates that the sequence is not available. The SRE is 
indicated by the two connected arrows facing each other. The "TGTGAGG" sequence is showed by a shaded box and 
the position and presence of a TATA box is indicated by the ellipse. After [64], 

Transcriptional regulation of protamine PI genes 

Transcription of the avian-mammalian PI type genes occurs in the pqst-meiotic, haploid 
stages of spermatogenesis as determined by Northern blot analysis of RNA from testis at different 
stages of development or from sorted cells [17, 35, 52, 55], and by in situ hybridization [35, 38, 
44, 55]. Run-off assays on isolated mouse nuclei indicate that the mouse PI gene is activated at 
the round spermatid stage [39]. What are the mechanisms leading to this specific activation? The 
availability of the nucleotide sequences from mouse, human and bull led to the prediction of some 
potential regulatory elements. For instance a TATA box is present in all of them [18, 24, 32], a 
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CRE-like element [50], CAT box. CG boxes and additional sequences [18, 24, 32. 50, 52], A 
different approach in the prediction of potentially important sequences has been the comparison ot 
homologous or heterologous protamine genes in the search for the conserved regions with the 
assumption that important regulatory sequences would have been conserved in evolution [50]. 
Thus the following comparisons were reported: mouse PI and P2 genes [24]: human, bull and 
mouse PI genes [33]; human PI and P2 genes, mouse PI, bull PI and human PI [18]; porcine 
genes [26]; and chicken, bull PI, mouse PI and trout protamine [50], A common problem in all 
The studies comparing protamine gene PI sequences was that the homology between the different 
PI genes available for analysis was relatively high so that discrimination between conserved 
regulatory sites and sites conserved simply because of a close origin in evolution was not possible 
in many cases. This problem was solved when the promoter region of additional PI genes was 
sequenced thus increasing the total number of sequences available for comparison [64] (Fig. 3). 

Four highly conserved sites were detected in the 5'region (-160 to - !} of the protamine 
genes [64]. The first one (-29 to -35) corresponds to the already previously described TAT A box, 
but with the novelty of being preceded invariably in all species by the di-nucleolide TC (Fig. 3). 
The second conserved region (-55 to -66) was named Protamine 1 Consensus (ProtiC) which is a 
CRE-like sequence (but always differently). The relevance of this conserved element in the 
expression of the PI genes is strongly supported by the demonstration of a mouse testis trans¬ 
acting factor ([75], Tet-1) which binds and matches in the mouse the first 11 bp ot the 
corresponding ProtiC sequence. Independent experiments using different oligonucleotides 
corresponding to the mouse PI 5' region [82] showed that the region -35 to -70 led to the 
appearance of three different specific bands in gel retardation assays upon incubation with nuclear 
extracts from different tissues. Only one of those bands was testis-specific. Similar results have 
been obtained with the rat PI sequence using rat nuclear extracts [65]. The third highly conserved 
region detected in the 5'of all PI genes corresponds to the sequence TGTGAGG (-88 to -82). 
This sequence is a palindrome of the seven central nt of ProtiC (binding sequence for factor Tetl 
in the mouse) and is exclusively present in this region in all PI genes whose sequence is available 
suggesting an important function in the differential expression of the protamine PI genes, This 
region corresponds to box E in the mouse promoter [24], The fourth highly conserved region 
corresponds to MATGCCCATATWTGGRCAYG and has the typical structure of serum response 
elements (SRE) [64], This region demonstrates specific interaction with a factor present in rat 
nuclear extracts from different tissues with a distinctive shifted band appearing in the testis 
extracts [65]. Also the equivalent region in the mouse PI gene (described as Box O) demonstrates 
specific binding with a factor present in mouse nuclear extracts. Thus two highly conserved 
sequences, the ProtiC (also referred as to CRE-like in all PI sequences or Tetl binding site in the 
mouse), and the SRE appear to bind a testis specific factor or a testis-specific combination of 
factors [65]. Both of these sequences lie within the minimal region (from bp -150 to bp -37) 
described to direct spermatid-specific expression with a heterologous promoter from a human 
growth hormone reporter gene [56, 81, 82]. 
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